Goto

Collaborating Authors

 Computational Learning Theory



Robust Regression of General ReLUs with Queries

Neural Information Processing Systems

We study the task of agnostically learning general (as opposed to homogeneous) ReLUs under the Gaussian distribution with respect to the squared loss. In the passive learning setting, recent work gave a computationally efficient algorithm that uses poly(d,1/ฯต)labeled examples and outputs a hypothesis with error O(opt)+ฯต, where optis the squared loss of the best fit ReLU. Here we focus on the interactive setting, where the learner has some form of query access to the labels of unlabeled examples. Our main result is the first computationally efficient learner that uses dpolylog(1/ฯต)+ O(min{1/p,1/ฯต})black-box label queries, where pis the bias of the target function, and achieves error O(opt)+ฯต. We complement our algorithmic result by showing that its query complexity bound is qualitatively near-optimal, even ignoring computational constraints. Finally, we establish that query access is essentially necessary to improve on the label complexity of passive learning. Specifically, for pool-based active learning, any active learner requires โ„ฆ(d/ฯต) labels, unless it draws a super-polynomial number of unlabeled examples.


Less Greedy Equivalence Search

Neural Information Processing Systems

Greedy Equivalence Search (GES) is a classic score-based algorithm for causal discovery from observational data. In the sample limit, it recovers the Markov equivalence class of graphs that describe the data. Still, it faces two challenges in practice: computational cost and finite-sample accuracy. In this paper, we develop Less Greedy Equivalence Search (LGES), a variant of GES that retains its theoretical guarantees while partially addressing these limitations. LGES modifies the greedy step; rather than always applying the highest-scoring insertion, it avoids edge insertions between variables for which the score implies some conditional independence.


a6e072cfc12794cba1e861f57be8f4de-Paper-Conference.pdf

Neural Information Processing Systems

We study a fundamental question of domain generalization: given a family of domains (i.e., data distributions), how many randomly sampled domains do we need to collect data from in order to learn a model that performs reasonably well on every seen and unseen domain in the family? We model this problem in the PAC framework and introduce a new combinatorial measure, which we call the domain shattering dimension. We show that this dimension characterizes the domain sample complexity. Furthermore, we establish a tight quantitative relationship between the domain shattering dimension and the classic VC dimension, demonstrating that every hypothesis class that is learnable in the standard PAC setting is also learnable in our setting.


Agnostic Learning under Targeted Poisoning: Optimal Rates and the Role of Randomness

Neural Information Processing Systems

We study the problem of learning in the presence of an adversary that can corrupt an ฮท fraction of the training examples with the goal of causing failure on a specific test point. In the realizable setting, prior work established that the optimal error under such instance-targeted poisoning attacks scales as ฮ˜(dฮท), where d is the VC dimension of the hypothesis class [Hanneke, Karbasi, Mahmoody, Mehalel, and Moran (NeurIPS 2022)]. In this work, we resolve the corresponding question in the agnostic setting. We show that the optimal excess error is eฮ˜( dฮท), answering one of the main open problems left by Hanneke et al. To achieve this rate, it is necessary to use randomized learners: Hanneke et al. showed that deterministic learners can be forced to suffer error close to 1 even under small amounts of poisoning.


Robust Estimation Under Heterogeneous Corruption Rates Syomantak Chaudhuri University of California, Berkeley Jerry Li University of Washington Thomas A. Courtade University of California, Berkeley

Neural Information Processing Systems

We study the problem of robust estimation under heterogeneous corruption rates, where each sample may be independently corrupted with a known but non-identical probability. This setting arises naturally in distributed and federated learning, crowdsourcing, and sensor networks, yet existing robust estimators typically assume uniform or worst-case corruption, ignoring structural heterogeneity. For mean estimation for multivariate bounded distributions and univariate gaussian distributions, we give tight minimax rates for all heterogeneous corruption patterns. For multivariate gaussian mean estimation and linear regression, we establish the minimax rate for squared error up to a factor of d, where d is the dimension. Roughly, our findings suggest that samples beyond a certain corruption threshold may be discarded by the optimal estimators - this threshold is determined by the empirical distribution of the corruption rates given.


Conservative classifiers do consistently well with improving agents: characterizing statistical and online learning

Neural Information Processing Systems

Machine learning is now ubiquitous in societal decision-making, for example in evaluating job candidates or loan applications, and it is increasingly important to take into account how classified agents will react to the learning algorithms. The majority of recent literature on strategic classification has focused on reducing and countering deceptive behaviors by the classified agents, but recent work of Attias et al. [5] identifies surprising properties of learnability when the agents genuinely improve in order to attain the desirable classification, such as smaller generalization error than standard PAC-learning. In this paper we characterize so-called learnability with improvements across multiple new axes. We introduce an asymmetric variant of minimally consistent concept classes and use it to provide an exact characterization of proper learning with improvements in the realizable setting. While prior work studies learnability only under general, arbitrary agent improvement regions, we give positive results for more natural Euclidean ball improvement sets. In particular, we characterize improper learning under a generative assumption on the data distribution. We further show how to learn in more challenging settings, achieving lower generalization error under well-studied bounded noise models and obtaining mistake bounds in realizable and agnostic online learning. We resolve open questions posed by Attias et al. [5] for both proper and improper learning.


Parsimonious Predictions for Strategyproof Scheduling

Neural Information Processing Systems

We consider the problem of scheduling mjobs on nunrelated strategic machines to minimize the maximum load of any machine. As the machines are strategic they may misreport processing times to minimize their own load. The pioneering work of Nisan and Ronen gave an n-approximate deterministic strategyproof mechanism for this setting, and this was recently shown to be best possible by the breakthrough results of Christodoulou et al. This large approxation guarantee begs the question: how can we avoid these large worst-case results. In this work, we use the powerful framework of algorithms with (machine-learned) predictions to bypass these strong impossibility results. We show how we can predict O(m+n)values to obtain a deterministic strategyproof algorithm whose makespan is within a constant factor of the optimal makespan when the predictions are correct, and O(n) times the optimum no matter how poor the predictions are.


Formal Models of Active Learning from Contrastive Examples

Neural Information Processing Systems

Machine learning can greatly benefit from providing learning algorithms with pairs of contrastive training examples--typically pairs of instances that differ only slightly, yet have different class labels. Intuitively, the difference in the instances helps explain the difference in the class labels. This paper proposes a theoretical framework in which the effect of various types of contrastive examples on active learners is studied formally. The focus is on the sample complexity of learning concept classes and how it is influenced by the choice of contrastive examples. We illustrate our results with geometric concept classes and classes of Boolean functions. Interestingly, we reveal a connection between learning from contrastive examples and the classical model of self-directed learning.


Generalization Bounds for Model-based Algorithm Configuration

Neural Information Processing Systems

Algorithm configuration, which involves selecting algorithm parameters based on sampled problem instances, is a crucial step in applying modern algorithms such as SAT solvers. Although prior work has attempted to understand the theoretical foundations of algorithm configuration, we still lack a comprehensive understanding of why practical algorithm configurators exhibit strong generalization performances in real-world scenarios. In this paper, through the lens of machine learning theory, we provide an algorithm-dependent generalization bound for the widely used model-based algorithm configurators under mild assumptions. Our approach is based on the algorithmic stability framework for generalization bounds. To the best of our knowledge, this is the first generalization bound that applies to a model closely approximating practical model-based algorithm configurators.